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Abstract. We study the relationship between the number of minus signs in a generalized 
sumsct, A + ■ ■ ■ + A — — A, and its cardinality; without loss of generality we may assume 
there are at least as many positive signs as negative signs. As addition is commutative 
and subtraction is not, we expect that for most A a combination with more minus signs 
has more elements than one with fewer; however, recently Iyer, Lazarev, Miller and Zhang 
|ILMZj proved that a positive percentage of the time the combination with fewer minus 
signs can have more elements. Their analysis involves choosing sets A uniformly at random 
from {0, . . . , N}; this is equivalent to independently choosing each element of {0, . . . , N} to 
| be in A with probability 1/2. We investigate what happens when instead each element is 

(S) ■ chosen with probability p(N), with limjv^oo p(N) = 0. We prove that the set with more 

minus signs is larger with probability lasA— ^ooifp(A) = cN~ s for 5 > where h is 

the number of total summands in A H h A A, and explicitly quantify their relative 

sizes. The results generalize earlier work of Hegarty and Miller jHMj . and we see a phase 
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transition in the behavior of the cardinalities when 5 = — r 
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1. Introduction 



1.1. Previous Results. Let A be a subset of the integers. We define the sumset A + A 
and the difference set A — A by 

A + A = {a x + a 2 : a* G A}, A - A = {ai - a 2 : a* G (1-1) 

Many important problems in number theory are related to these sets and their generaliza- 
tions. For example, if P denotes the set of primes and K the set of k th powers of positive 
integers, then the Goldbach conjecture is equivalent to P + P contains all even numbers, 
the twin prime conjecture is P — P contains 2 infinitely often, Fermat's Last Theorem is 
(K + K) H K is empty if k > 3, and Waring's problem is that for each k there is an s such 
that K + • • • + K (s times) contains all positive integers. 

Note the last problem involves more than one binary operation; the main goal of this paper 
is to explore what happens to generalized sumsets in different models. Before stating our 
results, we review some previous work. As addition is commutative and subtraction is not, a 
typical pair of integers generates two differences but only one sum. It is therefore reasonable 
to expect a generic finite set A has a larger difference set than sumset. If this is the case 
then we say A is difference dominated, while if the two sets have the same size we say the 
set is balanced, and if the sumset is larger then A is sum dominated (also called a more 
sums than differences (MSTD) set). It was conjectured that if A is chosen uniformly 
at random from {0, . . . , N} then as iV — > oo almost all sets are difference dominated. In 
2007, however, Martin and O'Bryant |MO] disproved this conjecture by showing a positive 
percentage of sets are sum dominated. The percentage is small, around 4.5 ■ 10 -4 |Zhj . 

While these results imply that sum dominated sets are not too rare, this is a consequence of 
how the sets are chosen. An equivalent formulation is that each element of In := {0, . . . , N} 
is chosen to be in A with probability 1/2. With high probability a randomly chosen subset 
A has approximately N/2 elements (with errors of size y/~N). Thus the density of a generic 
subset to the underlying set Ijy is quite high, typically about 1/2. Because it is so high, when 
we look at the sumset (resp., difference set) of a typical A there are many ways of expressing 
elements as a sum (resp., difference) of two elements of A. Almost all possible sums and 
differences are realized; the expected number of missing differences is 6, while the expected 
number of missing sums is 10. Thus, a typical set needs just a small nudge to become sum 
dominated. This can be accomplished by appropriately choosing the fringe elements of A 
(the elements near and N), as almost surely changes at the fringes do not affect whether 
or not most possible sums and differences are realized. 

This observation suggests that instead of taking each element of ijv with probability 1/2 
(or any fixed, non-zero probability), we should instead explore what happens when all of 
these elements are chosen independently with probability p(N), where p is some function 
tending to zero; this is a binomial model with parameter p(N). Such an analysis was done by 
Hegarty and Miller [HM] in 2009. They showed that if p(N) = cN~ s for some 8 £ (0, 1), then 
almost surely A is difference dominated. The analysis breaks into three cases based on the 
probability for choosing elements in A. The authors study fast decay (8 > 1/2), critical 
decay (8 = 1/2), and slow decay (8 < 1/2). There is a phase transition at 8 = 1/2, leading 
to the name critical decay. 

Before stating their results we first introduce some definitions, notation, conventions, and 
standard facts that we use in our results as well. 
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We start with notation for sizes. By f(x) = 0(g(x)) we mean that there exist constants 
xq and C such that for all x > Xo, \f(x)\ < Cg(x). We write f(x) = 6(g(x)) if both 
f(x) = 0(g(x)) and g(x) = 0(f(x)). If lim^oo f(x)/g(x) = then we write f (x) = o(g(x)), 
which is equivalent to f(x) <C g(x). 

As the fundamental objects of study are sizes of sets, we need a way to denote asymptotic 
behavior. Let X be a real-valued random variable depending on some positive integer pa- 
rameter N, and let f(N) be some real- valued function. By U X ~ f{N)" we mean that, for 
any e\, e 2 > 0, there exists N eue2 > such that, for all N > N eijC2 , 

P(X i [(1 - ei )f(N), (1 + e 1 )(f(N)}) < e 2 . (1.2) 

We now state the main past result, which we will generalize. 

Theorem 1.1 (Hegarty-Miller |HM] ). Let p : N ->• (0, 1) be any function such that 

N- 1 = o(p(N)) and p(N) = o(l). (1.3) 

For each N G N let A be a random subset of In chosen according to a binomial distribution 
with parameter p(N) . Then, as N — >■ oo, the probability that A is difference dominated tends 
to one. 

More precisely, let J?, @ denote respectively the random variables \A + A\ and \A — A\. 
Then the following three situations arise: 

(i) p(N) = o{N~ 1 / 2 ) : Then 

y ~ ( n -p( n )) 2 and @^2y ~ (n-p(n)) 2 . (i.4) 

(ii) p(N) = c ■ A r_1 / 2 for some c E (0, oo) : Define the function g : (0, oo) — >■ (0, 2) by 

g{x) := 2(' '~-( 1 - J > Y (1.5) 



X 



Then 



y ~ g[ — )N and & ~ g{c 2 )N. (1.6) 



(Hi) N~ 1 / 2 = o(p(N)) : Let ^ c := (2N + l)~y, @ c := (2N + Then 

4 

y c „ 2-3) c ~ . (1.7) 

p(N) 2 { ' 

Notice there is a phase transition at 5 = 1/2, where \A — A\ goes from almost surely having 
twice as many elements as \ A + A\ (when 5 > 1/2) to having the same number of elements to 
first order (when 5 < 1/2); further, an explicit, tractable formula is obtained for the relative 
sizes when 5 = 1/2 as a simple function of c. 

The goal of this paper is to generalize this theorem to arbitrary combinations of sums and 
differences. 
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1.2. Results. Before stating our results, we need some combinatorial results. We use the 
extended definition of the binomial coefficient, setting m = for integers < a < b. A 
central result, which we use again and again, is the stars and bars (or cookie) problem: for 
any pair of positive integers n, k, the number of distinct fc-tuples of non-negative integers 
that sum to n is ( n ^~[ 1 )- Note this is equivalent to counting the number of solutions in 
non-negative integers to X\ H — ■ + — n. This is readily found. If we choose k — 1 objects 
from n + k — 1 (there are ( n ^^ 1 ) ways to do so), we partition the remaining n objects into 
k sets, and there is a one-to-one correspondence between these partitions and our desired 
solutions. 

In our investigations below we always choose elements for our set A from In := {0, . . . , TV} 
independently with probability p(N) = cN~ s for fixed 5 G (0, 1) and c > 0. 

• Given a set A we define its generalized sumset A s ^ with s sums and d differences to 
be A + • • ■ + A — ■ ■ ■ — A; as we are only interested in cardinalities we may always 
assume d < s. 

• We write | A s>( j\ for its size. We always use h for the number of summands, so h = s+d. 

• An /i( Sjf f)-tuple is a set of h = s + d integers, {ai, . . . , a s , a s+ i, . . . , a^}. 

• If the associated sum Yli=i a i ~ Sj=i a s+j equals A then we say the tuple generates A. 
Note that the generalized sumset is the set of all numbers generated by /i( s rf )-tuples 
of elements of A. 

• Related to this is R(n,s,d), which we define to be the number of ways to generate 
n through /i( s>( f)-tuples of integers drawn from {0, . . . , A^}. As R(n, s, d) counts all 
permutations equally, order matters; for example, if a\ + a2 — 03 = n, then R(n, 2, 1) 
counts (ai, 0,2,0,3) and (a 2 ,a 3 ,ai) as two different entities. As N is fixed throughout 
our calculations and then sent to infinity only at the end, to simplify notation we 
write R(n, s, d), though really it should be -Rjv(n, s, d) to emphasize this dependence. 

In the course of our investigations we encounter the following constants and functions. 
For k a positive integer and j G (0, h/2), set 

~ ^■^f(H)( i -¥r)* (i - 8) 

These constants emerge in our phase transition function 

00 , 

9&8,d) ■= E(" 1 ) fc " 1 (^^ +d)fc ' (1-9) 

which for h = s + d > 2 converges for all x. 
Our main result is the following. 

Theorem 1.2. Let h be a positive integer, c > a real number, and choose pairs of integers 
(si,di) with Si > di and Si + dt = h; for definiteness let d\ > d%. Consider subsets A C In 
where each element of In is independently chosen to be in A with probability p(N) = cN~ s . 

• For 5 > ^r^ ; the set A Su a t with the larger di is larger almost surely. In particular, as 
N —7- 00 with probability one we have \A Sl ^ 1 \/\A S2t d 2 \ = faldzl) / (si\di\) + o(l). 

• If 5 = then almost surely \A s . di \ ~ Ng(c; s^di) (with g defined in (ll.9p ). and 
thus with probability one \A Sl ^ 1 \/\A S2 ^ 2 \ is g(c; Si,di)/ g(c; §2,(^2) + o(l) . 
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Thus for two sets with the total number of summands fixed, the set with more minus signs 
is larger almost surely when S > so there are more distinct elements in the generalized 
sumset with more minus signs. There is a phase transition in the behavior when S passes 
from being greater than to equaling 

The proof is similar to that in |HMj . which does the h = 2 case. The idea is to bound 
the number of times distinct fr( SjC n-tuples generate the same element. This allows us to 
discount the number of repeated elements in the generalized sumset. If we already know 
that most elements are distinct, then simple combinatorics allows us to compare their sizes; 
however, as 5 gets smaller, we choose more and more elements for A, which leads to more 
repeated elements in the generalized sumset. The analysis is significantly easier when there 
are fewer repeated generalized sums, as then the sizes of the two generalized sumsets are 
well separated. Specifically, in the case of fast decay, the analysis follows from Chebyshev's 
inequality. The case of critical decay is significantly more challenging and requires recent 
strong concentration results. We first must show that we can estimate the number of h^ s ^y 
tuples with a constant sum by the number of /i( Si( f)-tuples with h distinct elements. We then 
show that if we partition our /i^^-tuples into equivalence classes based on the number of 
other /i( s rf )-tuples with the same sum, the class of singletons is the largest, so most h^ s> d)- 
tuples generate a unique integer. We then define our function g(x; s,d) in terms of |^4 s ,d|- 

In Section [2~T1 we define R(n, s, d) to count possible values for A s ^. In Section l2~2l we bound 
the expected number of repeated elements, and in Section 12.31 we show that the number of 
repeated elements is close to its expected value. In Section 13.11 we study the case of fast 
decay. In Section 13.21 we study the case of critical decay. We end with a discussion of future 
work. 



2. Strong Concentration 

In this section, we first derive formulas for quantities related to the number of /i( Sj( f)-tuples 
generating a given number. These results are key ingredients in the strong concentration 
analysis. 

2.1. Determining R(n, s, d). Our first step is to determine a tractable formula for R(n, s, d), 
the number of /i^^-tuples of integers drawn from {0, . . . , N} that generate n. 

Lemma 2.1. Let n' := n + dN . We have 

/ Mfh\ fn' - i(N + 1) + h - 1\ , , 

= e y ( h _i )■ (2- 1 ) 

Proof. We first assume d = 0, so all signs are positive and n' = n. By the stars and bars / 
cookie problem, the number of ways to write n as a sum of h non-negative integers is ( n ^^ 1 ) . 
As it will be important later, it is worth noting that this treats 4 + 3 + 1 and 3 + 4+1 as 
two different representations. Also, note this is equivalent to solving x\ + • • • + Xh = n 
with each Xi a non-negative integer. We desire each summand to be in In '■= {0, . . . , N}, 
and thus ( n +Y) 

may overcount. We remedy this by using inclusion-exclusion to remove 
representations with summands exceeding N. 

We first remove all representations where at least one summand exceeds N. There are 
(k) ways to choose which summand this is. We write that summand as Xj = yj + N + 1, 
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and write Xj = yj for the remaining summands. Thus the number of representations where 
summand j exceeds N and the other summands are at least zero is the number of solutions 
to yi + ■ ■ ■ + i/h = n — (N + 1), which is just ( n ~\ N +y+ h _1 ) ■ If instead i summands are greater 
than N, we would get yi + • ■ • + yn = n — i(N + 1), for ^ n - l ( N +^)+ h ~ 1 ^ solutions. The claim 
now follows by inclusion-exclusion. 

We only need trivial modifications if d > 0. For the d elements occurring with a minus 
sign, a s+ i, . . . , a s+ d, write a'- = N — aj. Then 

a\ H V a s - a s+ i a S] = n (2.2) 

becomes 

ai H ha s + a' s+1 H + a' s+d = n + dN, (2.3) 

reducing us to the first case. □ 

In our strong concentration applications we need not R(n,s,d), but the closely related 
quantity indistinct ( n , s > d), which counts the number of representations of n by h = s + d 
distinct elements. The next lemma shows that these two quantities differ in a lower order 
term (relative to N). 

Lemma 2.2. The number of hr St d) — tuples which generate n using h distinct elements is of 
a higher order than repeated elements. In particular, if n' = n + dN then 

R(n,s,d) = R distinct (n,s,d) + 0(N h - 2 ). (2.4) 

Remark 2.3. If (n') h ~ 2 = o(N), the error term in Lemma \2.S\ exceeds the main term. 
While a more careful analysis gives a better error estimate, the bound above suffices for our 
applications as the main term is summed over a large enough regime that its contribution 
exceeds that of the error. 

Proof. If there is at least one repeated element, there are at most h — 2 free choices for the 
summands (we lose one choice for the repetition, and one choice as the sum must equal 
n). Thus the contribution from representations of n with a repeated element is at most 
0{N h ~ 2 ). □ 

2.2. Generalizing Hegarty-Miller's Random Variables. Hegarty and Miller [HMj in- 
troduce some useful random variables to prove their strong concentration results. We begin 
with a generalization of these quantities, and then derive useful bounds which give the needed 
asymptotic relations. 
For a set A, define 



Ak ■= | {{ax,...,a h },...,{a( k -x)h+x,~;akh}} ■ Z)i=i a i _ Z)t 



s+l a i 



Ekh—d sr^kh I ic\ r\ 

i=(k-l)h+l a i ~ 2^i=kh-d a i f > 

and let = \Ak\- Note that now the ordering of elements within the /i-tuples matters 
(because subtraction is not commutative), so we are looking at unordered fc-tuples of ordered 
elements. The dependence on the ordering is, however, weak. Given any one of these k- 
tuples, we can permute the first s elements or permute the last d elements without changing 
the number it generates, and thus such a permutation is the same element of the fc-tuple. 
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If all the elements of an /i-tuple are distinct (actually, all we need are no repeats among 
the first s and no repeats among the final d), then there are s\d\ ways to reorder the tuple 
without changing the number it generates, and thus all of these correspond to the same set 
(remember, the only way the ordering matters in the set of h elements is which are the first 
s elements and which are the last d). Thus, if all elements are distinct, there is overcounting 
by a factor of s\d\; we must take this into account later. 

We want to study these ^-tuples because they shed light on how many repeated elements 
are in the generalized sumset. We have k- tuples of /i( S)( j)-tuples, so each fc-tuple has a total of 
hk integers. We place /i( S)( j)-tuples in the same fc-tuple if they all generate the same number. 
Intuitively, because we need to subtract out repeated elements, all /i( Sj d)-tuples within the 
same k- tuple only count once in our generalized sumset, so counting these /c-tuples is equiv- 
alent to counting fy^-tuples. To make this more concrete, we present a short example. If 
h — 3, s — 3, and d — 0, then {3,4,7}, {5,6,3}, {1,11,2}, and {1,5,8} would all be in 
the same fc-tuple because they all sum to 14. If these four fy^-tuples were the only h^^y 
tuples that generated 14, then we would have A 4 = {{3, 4, 7} , {5, 6, 3} , {1, 11, 2} , {1, 5, 8}}. 
X k counts the number of /c-tuples, so the number of times there are exactly k ^(^ ^-tuples 
generating the same number. For example, if we also only had 4 /i( Si( f)-tuples that generated 
5, and 5 and 14 were the only two numbers generated, then X4 = 2 for the two different 
numbers generated by exactly 4 ft,( sd )-tuples. 

The reason why A\ is so important is that /i( Si( f)-tuples are only in A\ if no other 
tuples generate that number. We want the number of single /i( Si( f)-tuples in A\ (counted by 
Xi) to be the largest because then we know that most /i( Si d)-tuples generate a unique sum. 
The larger k is, the more constant sums we must have. We have /c-tuples of /i( s>( f)-tuples, 
and within each fc-tuple, all /i( Sj( f)-tuples contained inside generate the same number. If there 
were another /i( Si( f)-tuple that generated the same number, then the two /^^-tuples would 
be in A2. Therefore, X\ counts the number of distinct sums among our /i^^-tuples, which 
is important because we will show that this is the higher order than Xk for any k > 1, so X\ 
becomes critically important in measuring the size of the generalized sumset. So, X\ counts 
the number of /i( SjC f)-tuples that generate a distinct sum, because if an /i( Sjd )-tuple is in Ai, 
then there are no other ^(^ ^-tuples that generate the same number. Similarly, X 2 counts 
how many /i( Si d)-tuples generate the same sum as exactly one other /i^^-tuple. Therefore, 
if we know that X 2 = o(Xi), then we know there are significantly more /i( Sj d)-tuples with a 
unique sum than those with any number of repeated sums (because any /c-tuples in A k for 
k > I are also in Ai). 

The goal is to generalize Theorem 1.1 and Lemma 2.1 of |HMj . To do this we must 
bound the number of repeated elements in A s ^- If we knew that our generalized sumset 
contains mostly distinct sums (so most /i^^-tuples generate a distinct integer), then a simple 
combinatorial argument and Chebyshev's theorem would suffice to prove Theorem 11.21 In 
the case of fast decay, 5 > the number of repeated elements is a lower order than the 
number of distinct elements. The case of critical decay, 5 = is more difficult because 
now the number of our repeated elements is of the same order as the number of distinct 
elements. Intuitively, the smaller S is, the more elements from {0, . . . , iV} are in A, so the 
more likely it is that two /i^.^-tuples generate the same element. Thus a more sophisticated 
argument is needed to find the relevant cardinalities. 

We first introduce some terminology. 
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Definition 2.1. By Type we mean the k-tuples with hk distinct elements of In, while 
Type i refers to k-tuples with i repeated elements. 



By repeated elements, we mean total number of elements that would need to be removed 
for all elements to be distinct. For example, in the 7-tuple {1,1,1,2,2,3,4}, we say there 
are three repeated elements because we would need to remove {1,1,2} for all remaining 
elements to be distinct. For a fixed fc-tuple a, since we draw our A from a binomial model 
with parameter p(N) = cN~ s , we know 

Prob(a is of Type t) = f^J c kh-t N -s(kh-t) _ ^ 

Equation (12. 6p holds because the probability of choosing any element is independent of the 
probability of choosing any other element. We need a binomial coefficient because we have 
to choose t of the fc-tuple's total hk elements to repeat. Note that (12. 6p is for a fixed /c-tuple, 
but we do not know the locations of the repeated elements, so the binomial coefficient is 
necessary for all possible combinations of repeats. 

Let £i,fc(iV) be the number of /c-tuples of type i. Note that we have /c-tuples of /i-element 
sets; in those h element sets, the ordering of elements within matters a bit, though we may 
permute the first s or permute the last d without changing the number it generates. As in 
equation (2.4) of [HMj . 

hk-l 

E(X k ) = J2^AN)p(NY k ^ h . (2.7) 

This holds because we are summing over all possible types of /c-tuples times the probability 
of choosing a fc-tuple of that type, so we get the expected number of /c-tuples. 

Similar to |HMj . for 5 > the only contribution to (12.71) that matters is from the 

first term. This is equivalent to estimating the number of /c-tuples by only considering the 
number of /c-tuples with no repeated elements. We first estimate the contribution from this 
term, and then bound the contribution from the remaining ones. 



Lemma 2.4. We have 



The error above is 0(N^ h 2 ^ k 1 ^ +1 ), and bh.k is defined in (II. 8p 



Proof. Because we have to sum over all n in the interval to count how many times a /c-tuple 
can generate the same number, the number of /c-tuples of Type is 



sN 
\ ^ 

-dN 



MAO - U /R(n,s,d)/s<d\ + 0(N''- 2 ) ^ (J g) 



where R(n, s, d) is the number of /i( Si( j)-tuples elements in {0, . . . , N} that generate n. The 
error is because Co,k(N) counts distinct tuples, while R(n,s,d) allows repeats; however, our 
earlier analysis showed that the number of tuples with repeated elements is lower order (this 
is because h is fixed and N tends to infinity). From the Binomial Theorem and standard 
bounds on approximating binomial coefficients with the largest term (specifically, (J( n n = 
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+ OUinY'- 1 )), we Bnd 

n=-dN ^ ' \re=-diV ^ ' / n=-dN ^ 

(2.10) 

Letting n' = n + diV as before, define 

Sj {N) := ^ fR(<s d)/s\d\\ (2 n) 

We use the notation Sj(N) to sum over all possible n in R(n, s, d) in one of h intervals of 
length n. In Sj(N), j gives the index of the interval. From (12.101) . we see that it is useful to 
break £o,k(N) into these intervals in order to compute the total sum. To distinguish between 
R(n, s, d) and Sj(N), recall that R(n, s, d) is for a fixed n, while Sj(N) is for a fixed interval 
of length N. 

Assume h is even (the case of h odd is similar). We first approximate R(n,s,d). Let 
j = L^fJ — 1- Assume j > 0; the case of j = follows similarly, and we mostly omit the 
details. We have 

= D-^O ^y +o^i < 2 - 12 > 

this follows from standard approximation for the binomial coefficients and the Binomial The- 
orem. 

For the rest of this subsection, in all the analysis below the error term in the 
asymptotic relations denoted by ~ are at least one order smaller in N. 

We now have 

<J+1)N /J_v^j ( i\i (h\ (n'-iN) h -\ /(J+1)N\ 

Sj (N) = Yl ( sld - i=o[ JU )+o Yl )( n - iN ) 



h-2 



k 

n'=jN K ' \ n '=jN 

~ E k ^ a ^) 

n'=jN 

We pull out all terms that do not depend on n' to get 

(i+i)jv / j , v \ k 

s ^ ~ mm=w£ H (Vri>r- tN n ■ (2 ' 14) 

9 



Our goal is to find the dependence on s, d and N. To do this, we first approximate (I2.14p 
with an integral to get 




the cost of the approximation is one order lower in N as we have sums of polynomials. 

We change variables by taking x = (j + t)N with t ranging from to 1. Thus dx = Ndt 
and as j > (if j = we cannot pull out the power of j) 

k 

(2.16) 

From our definition of bh,k ( see dUSJ) ) and the fact that by symmetry it suffices to sum n' up 
to hN/2, summing over j completes the proof. □ 

Lemma 2.5. We have 

U Jik 

HX k ) ~ Ho,k(N)p(N)- hkS = ^—N^ k+1 ~ hkS , (2.17) 
with bh t k defined in (11. 8p . 

Proof. By Lemma [2.41 it suffices to show E(Xfc) ~ £,o,k(N)p(N)~ hkS . To show that we can 
estimate E(X fc ) by ^ k (N)p(N) hk , it suffices to prove that for each £ > we have 

^(iVM*)""* = o(^ k (N)p(N) hk ). (2.18) 

From (12.61) . for I > the probability of being Type £ is p(N) hk ~ e and the number of 
such fc-tuples is £e,k(N). The repeated elements can either be in the same h( s>c [\-txvple or in 
different /i( Sj d)-tuples. In both cases we have the same order, though. We have k sets of 
/i-tuples. That would give us hk independent variables; however, each of the /i-tuples must 
sum to N (so we lose k degrees of freedom), and then we lose another I by assumption (if 
i = we have no repeated elements, which is the main term). Thus for a fixed n the number 
of solutions is at most on the order of j /V /lfc_fc_£ ; summing over n gives at most order N, for 
a total contribution of at most order j\f( h - 1 ) k ~ i + 1 . 

We now multiply by the probability p(N) hk ~ e and get 

&, k (N)p(N) hk - e = o ( N (h-i)k-i+i-(hk-i)sy ( 219 ) 
Because 5 < 1 and i > 0, we know that 

&,k(N)p(N) hk - £ = O ( N {h-i)k+i-hk6-£(i-8)j = o (£ Qtk (N)N- Shk ■ N-'Q-*)) , (2.20) 

so the probability of choosing /c-tuples with I repeats is of a lower order than the probability 
of choosing a /c-tuple with no repeats, completing the proof. □ 
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2.3. Strong Concentration Results. We need to show Xk is strongly concentrated about 
its expected value as N — > oo to conclude that the actual number of distinct elements in 
the generalized sumset approaches the expectation. We know from Lemma 12.51 that the 
expected number of distinct elements is of a higher order than the expected number of 
repeated elements, but if we do not know that the actual number of distinct elements is 
close to its expectation, then Lemma 12.51 is of little use. Here we show that the actual 
number does indeed approach its mean. This is similar to equations (2.9) and (2.10) of 
[HM] . 

Lemma 2.6. For 5 > Xk becomes strongly concentrated about its expected value as 
N oo. 

Proof. We employ a second moment method to show that N \ kh > — o(p(N)) implies 
Xk is highly concentrated about its mean. 

Let A = Yl a ^ h P(Y a ^ Yp) wnere « ~ /? if fc-tuples a, (3 have at least one number in 
common, and Y a is an indicator variable for each unordered /c-tuple having a constant sum. 
As N — > oo, from Lemma [2.51 we know that E(X^) — > oo so, as in equation (2.9) of |HMj . it 
suffices to show that 

A = o(E(X k ) 2 ) = o k ((N 2 ^ k+2 )(c 2hh N~ 2hk5 )). (2.21) 

The main contribution is from pairs with hk distinct elements and exactly 1 element in 
common. From Proposition 12.4] we have 0(N^ h ~^ k+1 ) choices for a. There are hk choices 
for common element with j3, 0(N^ h ~^ k ) choices for the rest of (3, and 2kh — 1 elements in 

a U f3, so 

P(Y a nYp) = OipiN) 2 ^ 1 ). (2.22) 
Generalizing equation (2.10) in [HM] . 



A = ^2P{Y a nY p ) 



Qfjy{h-l)k+l+{h-l)k\ c 2kh-ljy-(2kh-l)6\ 

Q( N 2k(h-l)+l-(2kh-l)5y (2.23) 



Because 5 < 1, 



A = 0k ( N mh-l)+2-2khS^ ^ 2 _ 24 ^ 

which proves our lemma. □ 

3. Phase Transition 

3.1. Fast Decay. Here we prove the first claim of Theorem 11.21 We can do this using 

Chebyshev's inequality and Lemmas 12.51 and 12.61 This is equations 2.11-2.12 of [HM] . For 

u ^ hi 

X l - EiXx) ~ (p h>1 N h ) (c h N- hS ) 

X 2 ~ {b h)2 N 2 ^ +l ) (c 2h N- 2h5 ) . (3.1) 
n 



We get the above equations from plugging k — 1, 2 into Lemma \2. 51 Because 5 > and 

x 1 = e{N h ~ hS ), 

X 2 = o(N 2{h - l)+l ~ 2h5 ) = 0(X l ) + 0{N~ h+l+hS ). (3.2) 

The error term is lower order, so as N — > oo, all but a vanishing proportion of /i( S) d)-tuples 
will generate distinct sums. 



3.2. Critical Decay. We are now ready to prove the second claim in Theorem 11.21 The 

i-i 

h ' 



key result in our earlier approximation of Xf. is that when we plug in 5 = all the 



exponents on N sum to 1, so we are left with a term on the order of N. 
We first claim 



< X m . (3-3) 



|A S)d |-^(-l) fe -% 

k=l 

We omit the proof because of its similarity to |HMj (see equation (2.16) there). 
We now want to show 

m 

\A,, d \ ~ ^(-l)*" 1 ^. (3.4) 

k=i 

To do this, we need to show that the coefficients on X m go to as k — > oo. The proof of 
this is a rote bound; we omit the details. By our concentration result in Lemma 12.61 

X m ~ E(X m ) ~ b h , k N^ k+1 - hkS c hk ~ b h>k c hk N. (3.5) 

Therefore, because S = 

X m ~ fe^fcC^JV. (3.6) 
Following equation (2.18) of |HMj and using equation (I3.3j) : 

m m 

I^Ml ~ ^(-l)*- 1 ^ ~ Nj2(-l) k -%,kc hk . (3.7) 
fe=i fe=i 

We conclude that 

S s d ~ Ng(c;s,d). (3.8) 

We define ^(c; s, d) to capture the ^-dependency of the size of our generalized sumset 
A St a,- Unlike in the case of A + A versus A — A, this function no longer has a nice closed 
form. The function we have defined arises in the generalization of Hegarty-Miller's random 
variables, and the purpose of g(c; s, d) is to identify and pull out the N to determine how the 
size of the generalized sumset depends on N . 

We want to compare the sizes of two sets A Sl ^ dl and A S2jd2 for si + d\ = s 2 + d 2 = h. The 
k, h, N factors are all the same and cancel, so |^4 si ,d x |/|^4.s 2 ,rf2 1 depends only on s 1; s 2 , d±, d 2 . 
Therefore, 

\As.dA s 2 \d 2 



(3.9) 
\A S2 4 2 \ si'rfi! 

Because d < s, the maximum value of l/(s\d\) is achieved at the minimum value of s\d\, 
which occurs when s = d. Thus, we conclude that as N — > oo, with a probability of choosing 
elements decaying in N, the set with the most minus signs is almost surely larger. This 
proves the second claim of Theorem 11.21 
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3.3. Future Work: Slow Decay. We are left with the case when 6 < This was done 
in the third case of Theorem 1.1 in |HM] for two summands, but in the general case of slow 
decay it is considerably more difficult for a number of reasons. The crucial difference in the 
analysis of the case of critical decay and the case of slow decay is that the case of slow decay 
focuses on the number of elements missing from A Sj d, while the case of critical decay focuses 
on the number of elements present in A a> d- In the previous sections, we approximated \A S ^\ 
by focusing on the middle of the interval [— dN, sN] because it was here that elements were 
most likely to be present. However, to measure the number of sums missing from A s ^, we 
instead need to look at the fringes of the interval, so the analysis shifts completely. Following 
[HMj . we would need to estimate the expectation of the number of elements missing from 
the generalized sumset. In |HM] . they let S n denote the event that n ^ A + A. They can 
then find the expected number of missing sums, 

2N 

n^ c \ = ( 3 - 10 ) 

n=0 

however, to find P{<s> n ), they use that all ways of representing any integer n are independent 
of one another. This leads to the following nice equation in [HMj : 

pr/M _ /(I -P 2 ) n/2 (1 -p) if n is even 
1 n) ~ [(l-p^+W if n is odd. 1 } 

In the general case, this formula is significantly less tractable as now the various ways to 
summing to n all depend on one another. The probability must be conditioned on each 
previous element chosen to be in the /i^-tuple, and that is the major difficulty in finding 
this formula in the general case. In the next equation of |HMj . they sum over the probabilities 
of each n in the interval: 

[N/2\ 



ny c ] ~ 4- Y,(i-pT ~ -; (3.12) 

m=0 P 

however, in the h = 2 case, this summation takes advantage of nice geometric series properties 
which are not have available in the general case, and are thus left for future work. 
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